On Maximum-reward Motion in Stochastic Environments
نویسندگان
چکیده
In this thesis, we consider the problem of an autonomous mobile robot operating in a stochastic reward field to maximize total rewards collected in an online setting. This is a generalization of the problem where an unmanned aerial vehicle (UAV) collects data from randomly deployed unattended ground sensors (UGS). Specifically, the rewards are assumed to be generated by a Poisson point process. The robot has a limited perception range, and thus it discovers the reward field on the fly. The robot is assumed to be a dynamical system with substantial drift in one direction, e.g., a high-speed airplane, so it cannot traverse the entire field. The task of the robot is to maximize the total rewards collected during the course of the mission, given above constraints. Under such assumptions, we analyze the performance of a simple receding-horizon planning algorithm with respect to the perception range, robot agility and computational resources available. Firstly, we show that, with highly limited perception range, the robot is able to collect as many rewards as if it could see the entire reward field, if and only if the reward distribution is light-tailed. The second result attained shows that the expected rewards collected scale proportionally to the square root of the robot agility. Finally, we are able to prove that the overall computational workload increases linearly with the mission length, i.e., the distance of travel. We verify our results in simulation examples. At the end, we present one interesting application of our theoretical study to the ground sensor selection problem. For an inference/estimation task, we prove that sensors with randomized quality outperform those with homogeneous precisions, since random sensors yield a higher confidence level of estimation (lower variance), under certain technical assumptions. This finding might have practical implications on the design of UAV-UGS systems. Thesis Supervisor: Sertac Karaman Title: Assistant Professor of Aeronautics and Astronautics
منابع مشابه
Maximum-Reward Motion in a Stochastic Environment: The Nonequilibrium Statistical Mechanics Perspective
We consider the problem of computing the maximum-reward motion in a reward field in an online setting. We assume that the robot has a limited perception range, and it discovers the reward field on the fly. We analyze the performance of a simple, practical lattice-based algorithm with respect to the perception range. Our main result is that, with very little perception range, the robot can colle...
متن کاملOn time-dependent neutral stochastic evolution equations with a fractional Brownian motion and infinite delays
In this paper, we consider a class of time-dependent neutral stochastic evolution equations with the infinite delay and a fractional Brownian motion in a Hilbert space. We establish the existence and uniqueness of mild solutions for these equations under non-Lipschitz conditions with Lipschitz conditions being considered as a special case. An example is provided to illustrate the theory
متن کاملThompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems
The multi-armed bandit problem forms the foundation for solving a wide range of on-line stochastic optimization problems through a simple, yet effective mechanism. One simply casts the problem as a gambler that repeatedly pulls one out of N slot machine arms, eliciting random rewards. Learning of reward probabilities is then combined with reward maximization, by carefully balancing reward explo...
متن کاملEfficient Motion Planning Algorithm for Stochastic Dynamic Systems with Constraints on Probability of Failure
When controlling dynamic systems such as mobile robots in uncertain environments, there is a trade off between risk and reward. For example, a race car can turn a corner faster by taking a more challenging path. This paper proposes a new approach to planning a control sequence with guaranteed risk bound. Given a stochastic dynamic model, the problem is to find a control sequence that optimizes ...
متن کاملAdapting to a Changing Environment: the Brownian Restless Bandits
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and decides on the strategy for the next iteration. The goal is to maximize the reward by balancing exp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015